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This  paper  briefly  surveys  some  aspects  of  robust  inference 
for  time  series,  and  gives  an  indication  of  the  current  state  of 
knowledge  in  other  problem  areas.  Basic  notions  of  robustness  are 
stated,  and  technical  difficulties  associated  with  the  time  series 
case  are  mentioned.  Some  models  for  time  series  with  outliers  are 
given.  Least-squares  procedures  lack  robustness  for  such  models 
and  robust  alternatives  are  described.  Issues  of  adaptivity  versus 
robustness  are  briefly  mentioned.  Robustness  problems  involving 
dependency  are  discussed.  Algorithms  for  robust  data  smoother- 
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1.  INTRODUCTION 

The  body  of  theoretical  work  on  time  series  utilizes  primarily  one  of  two 
mathematically  convenient  fictions,  namely  either  (i)  a  second-order  descrip¬ 
tion,  or  (ii)  a  Gaussian  assumption,  in  which  a  case  second-order  description  is  a 
complete  description.  The  second-order  formulation  is  at  the  base  of  many 
important  concepts  and  structures  in  time  series,  including  Wold’s  decomposi¬ 
tion,  the  spectral  representation,  and  prediction  theory.  In  all  of  these  one  has 
the  convenience  of  utilizing  Hilbert  space  methods  (for  details  see  the  appropri¬ 
ate  sections  of  the  recent  book  by  Grenander,  1981).  On  the  other  hand  the 
Gaussian  assumption  allows  one  to  utihze  the  parametric  method  of  maximum 
likelihood  for  time  series  models,  early  work  in  this  area  being  due  to  Whittle 
(1953,  1962).  The  nonparametric  method  for  time  series  consists  of  estimating 
the  spectrum,  a  second-order  description  in  the  frequency  domain,  by  a  variety 
of  methods  based  on  the  periodogram. 

Unfortunately,  many  time  series  encountered  in  practice  are  quite  decid¬ 
edly  non-Gaussian,  as  many  practitioners  know,  and,  correspondingly,  second- 

order  descriptions  are  far  from  adequate.  Series  often  contain  anomalies  of 
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numerous  kinds,  including  local  bumps  or  bursts,  shifts  in  level,  nonstationari- 
ties  of  various  kinds,  and  isolated  outliers.  Least-squares  and  other  Gaussian 
maximum-likelihood  procedures  are  quite  non-robust  toward  such  phenomena. 
Here  we  shall  be  primarily  concerned  with  methods  which  are  geared  to  deal 
well  with  a  not-too-large  fraction  of  local  bumps  or  bursts  and  isolated  outliers. 

It  cannot  be  stressed  too  strongly  that:  (i)  second-order  descriptions  are 
woefully  inadequate  for  representing  such  phenomena,  and  (ii)  a  Gaussian  mar¬ 
gined  distribution  for  a  series  hardly  insures  that  potent  versions  of  such 
phenomena  do  not  exist.  For  a  striking  and  graphic  portrayal  of  these  two  facts 
see  the  example  displayed  in  Figures  4  through  11  of  Martin  and  Thomson 
(1992).  The  essence  of  the  example  in  these  figures  is  that  a  time  series  often 
has  a  moderate  to  large  amount  of  low  frequency  energy,  with  corresponding 
sample  paths  having  broad  peaks  and  valleys,  so  that  outliers  and  bumps  can  be 
modest  to  small  on  the  scale  of  the  process  (e.g.,  as  measured  by  the  range  of 
the  data),  while  being  quite  large  on  a  local  scale  and  clearly  visible  to  the  eye. 

This  last  observation  leads  us  to  give  the  following  loose  definition  of  an 
outlier  in  a  time  series.  An  outlier  yt  is  a  data  value  which  lies  well  outside  of 
the  central  mass  (say  95%  of  the  mass)  of  conditional  density  /  ( yt  \  Yl~x)  where 
the  conditioning  variables  Y*~l  consist  of  all  the  past  observations 
F1-1  =  (yt,  .  .  .  ,y*_i).  This  density  is  often  called  the  observation  prediction 
density.  Since  we  seldom  get  our  hands  on  such  a  conditional  density,  it  is  con¬ 
venient  and  natural  to  cast  the  definition  somewhat  differently.  Let  y}~1  denote 
a  "good"  predictor  of  the  yt  given  the  past  F*-1.  In  particular  yf~l  should  have 
the  kind  of  resistance/robustness  properties  discussed  in  the  next  section,  so 
that  this  predictor  is  not  unduly  affected  by  outliers  in  Ft_1  (such  a  predictor 
appears  in  Section  8.)  Then  yt  is  an  outlier  if  the  prediction  residual 
rt  =  yt-y/_1  has  magnitude  large  compared  with  a  good  scale  measure  sr  for  all 


of  the  residuals  rt,  t  =  1,  .  .  .  ,  n.  For  example  one  might  well  take  sr  to  be  the 
suitably  scaled  interquartile  distance  of  the  rt.  These  definitions  can  be  gen¬ 
eralized  in  a  more  or  less  obvious  way  to  cover  the  case  of  a  ‘‘patch”  or  "bump" 
of  outliers,  yt . yt+k- 

The  above  comments  should  make  the  following  point  clear.  One  cannot 
hope  to  have  a  good  method  for  dealing  with  outliers  in  time  series  by  using  only 
an  instantaneous  nonlinear  transformation  of  the  data,  i.e.,  treatment  of  the 
form  yt  =  g(yt)-  True,  some  time  series  will  contain  outliers  which  are  large  on 
the  scale  of  the  process,  and  in  such  cases  such  a  procedure  may  prevent  the 
worst  consequences.  Note,  however,  that  yt  will  in  general  still  be  an  outlier  in 
the  sense  given  above,  for  this  value  is  specified  without  regard  to  the  neighbor¬ 
ing  values  yt-u  yt+u  etc.  of  the  series.  More  sophisticated  procedures  are  called 
for  and  these  will  be  discussed  in  Sections  5,  7  and  8.  Sections  2  and  3  review 
robustness  concepts  for  independent  observations  and  for  time  series,  respec¬ 
tively.  Some  time  series  outlier  models  are  mentioned  in  Section  4.  Some 
robust  alternatives  to  least-squares  and  Gaussian  maximum-likelihood  pro¬ 
cedures  are  introduced  in  Section  5.  Section  6  comments  on  fully  adaptive  esti¬ 
mates.  Section  7  deals  with  some  aspects  of  robustness  toward  dependency, 
both  with  and  without  outliers  simultaneously  present.  Finally  Section  8  briefly 
describes  robust  data  smoother-cleaner  algorithms,  and  gives  an  application  to 
radar  glint  noise. 


2.  ROBUSTNESS  CONCEPTS  FOR  INDEPENDENT  OBSERVATIONS 


The  following  comprise  four  robustness  concepts  in  moderately  wide  use 
today:  (1)  Resistance;  (2)  Efficiency  Robustness;  (3)  Min-Max  Robustness;  (4) 
Qualitative  Robustness.  These  concepts  have  been  applied  mainly  to  situations 
involving  only  independent  observations  until  quite  recently. 

Resistance,  a  term  due  to  J.  W.  Tukey  (1976),  is  in  fact  a  term  distinct  from 
robustness.  It  is  the  data-oriented  version  of  the  probability  based  word  robust. 
As  such  it  is  the  basic  primitive  form  of  robustness  which  captures  the  essential 
goals  of  robust  estimation,  namely  large  changes  in  a  smallish  fraction  of  the 
data,  e.g.,  gross  outliers,  should  have  only  a  small  effect  on  the  estimate.  Small 
changes  in  all  the  data,  e.g.,  rounding  (or  fine  quantization),  should  have  only  a 
small  effect  on  the  estimate.  As  is  well  known,  least-squares  and  other  Gaussian 
maximum-likelihood  procedures  lack  resistance,  and  hence  resistant/robust 
procedures  have  been  invented. 

Of  the  three  bonafide  robustness  terms,  the  notion  of  efficiency  robustness 
(Tukey,  1960;  Mosteller  and  Tukey,  1977)  is  the  oldest  and  least  mathematical 
concept,  and  hence  the  one  most  accessible  to  applied  statisticians.  Let  V$(F) 
denote  a  variance  standard  of  reference  at  data  distribution  F,  and  for  the 
moment  assume  we  are  in  one  of  those  special  situations  where  unbiased  esti¬ 
mates  exist.  V$(F)  might  be  the  Cramer-Rao  bound  for  either  asymptotic  or 
finite-sample  cases.  It  would  preferably  be  the  Pitman  bound  in  the  latter  case, 
when  dealing  with  problems  such  as  location  and  scale  where  the  Pitman  bound 
can  by  some  means  be  evaluated  (Pregibon  and  Tukey,  1981).  Alternatively, 
Ks(F)  may  be  simply  the  variance  of  the  best  known  estimate  at  distribution  F . 
With  Vf(F)  the  variance  of  estimate  T  at  distribution  F,  the  efficiency  of  T  at  F 
is 


EFF(T,F)  = 


VT(F) 


(1) 
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An  e fficiency-ro  bust  estimate  T  is  one  whose  efficiency  is  high  at  the  nominal 
distribution  F0  (often  Gaussian),  and  also  high  at  strategically  chosen  alterna¬ 
tive  distributions  Fy.Fz,  ■  •  ,  Fg  (usually  heavy-tailed  outlier-generating  distri¬ 

butions).  Often  efficiencies,  REFF(T .Tls'.F),  relative  to  least-squares  or  other 
Gaussian  maximum-likelihood  estimates,  are  used  with  the  variance  Vis  or  Vgmle 
replacing  V$  in  (l).  For  problems  where  bias  is  unavoidable,  and  this  is  the  case 
for  almost  all  truly  realistic  robustness  problem  formulations,  one  will  use 
mean-squared  errors  in  place  of  variances  in  (1),  and  also  compare  biases  as 
well. 

Huber  (1964)  introduced  minmax  robust  estimates  in  his  by-now  classic 
paper  on  robust  estimates  of  location.  Here  the  asymptotic  variance  V(T,F)  of 

estimator  T  at  distribution  F  is  the  loss  and  the  statistician  wishes  to  minimize. 

a — •  crV' 

over  a  family^  of  estimates,  the  maximum  of  V(T ,F)  over  a  family  of  distri¬ 
butions.  Huber  showed  that  such  min-max  estimates  exist  in  the  class  of  loca¬ 
tion  M -estimates  p-T  obtained  by  solving 


min 


A 

Vi~F 

c-s 

(2) 


with  p  symmetric  and  convex,  the  y *  independent  and  identically  distributed 
(i.i.d.),  and  ~  F( •  -pi).  Here  s  is  a  robust  scale  estimate  and  c  is  a  tuning  con¬ 
stant  adjusted  to  obtain  high  efficiency  robustness.  Equivalently  p  is  a  solution 
of 


Vi -A 
c-  s 


0 


(3) 


with  psi  function  ip  =  p'.  We  henceforth  choose  s  =  1  and  absorb  c  into  the 
definition  of  ip  for  notational  convenience.  Huber’s  (1964)  famous  min-max  solu¬ 
tion  is  based  on  an  e-contaminated  family  with  standard  Gaussian  central 


distribution,  and  the  saddle-point  pair  ( T0,Fq )  has  T o  =  do  obtained  from  (3)  with 
ip  =  given  by 


Vo(0  = 


t  \t\  <K 
K  sgn  (t)  \t  \  ZK 


(4) 


with  K  =  K{t)  determined  by  the  contamination  fraction  e.  Other  families  yield 
other  saddle-point  ^-functions  (see  for  example  Huber,  1981). 

Qualitative  robustness  was  introduced  by  Hampel  (1968,  1971),  and  this  is  a 
fundamental  continuity  property  which  is  the  probabilistic  counterpart  of 
Tukey's  data-oriented  term  resistance.  Let  Y  .  .  .  ,  Yn  be  i.i.d.  with  values  in  Rk 
and  common  distribution  F,  and  let  Tn  =  Tn(Yi,  ....  Yn)  define  a  sequence  of 
estimates  with  values  in  Rp  for  sample  sizes  n  =  1,2,  •  ■  •  .  This  sequence 
induces  the  sequence  of  maps 


Tn  :  F  ■*  Lrn(F)  (5) 

where  Lrn(F)  is  the  Law  of  Tn  at  F.  Then  Tn  is  said  to  be  qualitatively  robust  at 
F  (or  in  a  neighborhood  of  F,  or  everywhere)  if  the  sequence  of  maps  (5)  is 
equicontinuous  at  F  (or  in  a  neighberhood  of  F,  or  everywhere),  using  the 
Prohorov  distance  on  the  metric  spaces  where  F  and  Lrn(F)  are  elements.  The 
Prohorov  metric  incorporates  the  possibility  of  both  gross  outliers  and  rounding 
errors  in  e-neighborhoods  in  a  natural  manner,  and  thus  is  extremely  attractive 
for  use  in  a  robustness  definition. 

When  fTnj  is  obtained  from  a  functional  T  =  T(F)  defined  on  a  subset  ^  of 
the  family  of  all  distributions  by  evaluation  of  T  at  the  empirical  distribution 
function  (e.d.f.)  Fn,  Tn  =  T(Fn),  one  set  of  sufficient  conditions  for  [rn|  to  be 
robust  at  F  is:  (i)  Tn  =  T(Fn)  is  a  continuous  function  on  Rn  for  each 
n  =  1,2,  ,  and  (ii)  T  is  continuous  at  F.  For  Huber’s  class  of  location  M- 
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estimations  (3)  T  is  defined  implicitly  by 


/  iKy-T(F))dF(y)  =  0. 


In  essence  robustness  is  achieved  by  choosing  ^  to  be  bounded  and  monotone. 
(In  addition,  uniqueness  of  the  solution  To(F)  at  F  is  needed— see  Huber,  1981.) 

Of  the  above  concepts  I  regard  resistance  and  qualitative  robustness  as  fun¬ 
damental,  with  efficiency  robustness  a  close  companion.  Qualitative  robustness 
is  a  principle  which  should  be  regarded  on  a  par  with  other  principles  of  statis¬ 
tics  such  as  sufficiency,  unbiasedness,  etc.  Whenever  possible  a  statistic  should 
be  selected  to  have  the  property  of  qualitative  robustness,  all  other  things  being 
relatively  equal.  Thus  from  now  on  the  term  robust,  without  other  qualifiers,  will 
be  taken  to  mean  qualitatively  robust. 

Since  some  rather  ridiculous  estimates  (such  as  T  =  c ,  with  c  a  constant) 
are  robust,  one  needs  to  combine  the  principle  with  some  other  measure,  and 
efficiency  robustness  is  a  natural  candidate  (see  Beran  1977a,  1977b,  for  notable 
efforts  to  obtain  full  efficiency  and  robustness  simultaneously). 

Min-max  robustness  is  more  or  less  frosting  on  the  cake:  it  is  nice  to  have, 
but  one  shouldn't  lose  any  sleep  over  not  obtaining  it.  Also  one  should  not,  as 
has  been  done  in  some  of  the  recent  engineering  literature,  take  min-max 
robustness  as  the  guiding  concept,  at  least  not  without  some  circumspection. 
The  main  justification  for  concentrating  on  min-max  robustness  would  be  that 
one  already  has  a  basic  continuity  property  in  hand,  but  that  the  modulus  of 
continuity  is  so  bad  that  something  like  a  good  min-max  solution  would  be 
appealing.  Note,  however,  that  one  must  demonstrate  that  the  modulus  of  con¬ 
tinuity  is  indeed  bad,  and  this  is  a  somewhat  subjective  matter. 

There  are  two  important  concepts  affiliated  with  the  core  ideas  of  robust¬ 
ness  which  are  also  due  to  Hampel.  The  first  is  the  breakdown  point  (Hampel, 
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1968,  1971),  a  global  (asymptotic)  measure  which  is  essentially  the  largest  frac¬ 
tion  of  contamination  which  an  estimator  can  stand  without  breaking  down  com¬ 
pletely  by  virtue  of  being  taken  to  the  boundary  of  the  parameter  space.  The 
second  concept,  the  influence  curve  (Hampel,  1974),  is  an  asymptotic 
infinitesimal  (or  local)  measure  which  gives  the  effect  of  a  vanishingly  small  frac¬ 
tion  of  contamination  of  specific  value  on  an  estimate  as  the  sample  size  tends 
to  infinity. 

Influence  curve  considerations  lead  one  to  use  psi-functions  (e.g.,  ip  in  Eq. 
(5))  that  are  continuous.  In  the  sequel  we  take  boundedness  and  continuity  of  ip 
to  be  the  essential  features  needed  for  robustness.  Non-monotone  ip  can  be 
used  by  computing  one-step  Newton  solutions  to  equations  like  (5),  starting  with 
a  near-solution  obtained  with  a  monotone  ip. 

Both  the  above  concepts  have  finite-sample  versions.  Tukey's  sensitivity 
curves  or  stylized  sensitivity  curves  (see  Andrews  et  al.,  1972),  and  Mallows’ 
empirical  influence  curves  (Mallows,  1976)  are  finite  sample  versions  of  the 
influence  curve.  Hodges  ( 1967)  introduced  the  precursor  of  the  breakdown 
point,  and  recently  Donoho  (1982)  has  stressed  the  relative  importance  of 
finite-sample  breakdown  points. 

Bounded-influence  regression  is  an  approach  to  regression  which  was 
stimulated  by  the  notion  that  an  estimator’s  influence  curve  should  be  bounded. 
This  problem  are  has  seen  vigorous  attention  by  a  small  group  of  researchers 
(Hampel,  1975,  1978;  Mallows,  1976;  Krasker  and  Welsch,  1982;  Maronna,  Bustos 
and  Yohai,  1979).  This  topic  deserves  a  brief  introduction,  both  for  its  own  sake, 
and  also  because  the  approach  may  be  adapted  for  robust  estimation  of  certain 
times  series  models.  Consider  the  regression  model 

y.  =  ,  i  =  l, ....  n 


(?) 
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where  the  e*  are  i.i.d.  with  common  symmetric  distribution  Ft,  and 
0T  =  (/?!,  ....  0p).  M -estimates  0y  for  regression  are  solutions  of  the  estimating 
equation 

£  =  0  (8) 

i=i 

obtained  by  minimizing  the  regression  analogue  of  (2).  It  is  assumed  that  ip  is 
bounded,  continuous  and  monotonic. 

First  suppose  that  the  x^  are  known  exactly  (i.e.,  are  observed  without 
error)  and  the  specification  (7)  with  regard  to  the  x[0  is  correct.  Then  the  only 
source  of  distributional  difficulty  is  the  e*  which  may  contain  outliers  due  to  Ft 
being  heavy-tailed.  In  this  formulation  0y  is  robust  according  to  Hampel's 
asymptotic  definition.  There  may,  however,  still  be  some  finite  sample  problems 
caused  by  so-called  X-leverage  points  (see  Huber,  1981,  Chapter  7;  Belsley,  Kuh 
and  Welsch,  1980). 

On  the  other  hand,  suppose  the  Xj  are  occasionally  observed  with  large 
errors  (say  keypunch  errors  for  example),  and/or  the  specification  (7)  is 
incorrect  in  any  one  of  a  variety  of  ways  (e.g.,  a  mixture  model  for  0  with 
P(0=0 o)  =  \-y  and  P(0=0 1)  =  y  with  y  small).  Then  M-estimates  0u  are  not  at 
all  robust.  In  order  to  obtain  regression  estimates  which  are  robust  against 
such  possibilities,  it  is  desirable  to  use  a  bounded-influence  (BI)  regression  esti¬ 
mate  0  which  is  the  solution  of  an  equation  of  the  form 

£  p(xj.y,  -x/0)  =  0  (9) 

»=i 

where  ^(v)  is  a  bounded  and  continuous  function  on  RPxR1.  This  will  guard 
against  outliers/model  uncertainty  in  both  the  independent  variables,  or  car¬ 
riers  Xj  and  the  residuals  It  would  be  quite  dangerous  to  rely  on  the  M- 


estimate  Pm  if  one  were  not  quite  sure  about  the  purity  of  the  x*. 

The  reasons  for  pointing  out  the  above  features  of  ordinary  regression  M- 
estimates  and  BI  regression  alternatives  are  twofold.  First  of  all  there  are  cer¬ 
tain  problems  in  communications  theory  (and  practice)  where  exact  knowledge 
of  the  is  virtually  assured.  This  is  the  case,  for  example,  where  xfp 
represents  a  signal  of  known  structure,  such  as  a  constant  signal  (i.e.,  a  location 
problem)  or  a  sinusoidal  signal  with  unkown  amplitude  (where  p  =  1).  or  with 
unknown  amplitude  and  phase  (where  p  =  2).  We  discuss  such  problems  in  Sec¬ 
tion  7.  On  the  other  hand,  when  one  is  fitting  autoregressive  (AR)  or 
autoregressive-moving-average  (ARMA)  models,  and  one  has  an  additive  outliers 
(AO)  model,  as  discussed  in  Section  4,  the  carriers  are  quite  definitely  contam¬ 
inated  and  observed  with  error.  For  this  situation  autoregression  if -estimates 
are  hopelessly  bad,  and  some  form  of  bounded-influence  regression  is  called  for. 

Among  the  topics  which  deserve  mention,  but  are  otherwise  beyond  the 
scope  of  this  paper,  I  would  mention:  (i)  quantitative  robustness  (see  Huber, 
1981,  Chapter  1);  (ii)  a  decision  theoretic  framework  for  robustness  (Millar, 

1981) ;  (iii)  asymptotically  shrinking  Vri  neighborhood  formulations  (Bickel, 

1982) ;  (iv)  finite-sample  min-max  results  for  testing  and  confidence  intervals 
(Huber,  1981,  Chapter  10);  (v)  Hampel's  extremal  problem  (Huber,  1981, 
Chapter  11). 


3.  ROBUSTNESS  CONCEPTS  FOR  TIME  SERIES 


Although  the  fundamental  continuity  idea  behind  robustness  has  a  simple 
and  immediate  appeal,  both  the  definition  and  the  proofs  of  sufficient  conditions 
are  highly  technical  (even  the  need  for  the  equicontinuity  part  of  the  definition 
requires  a  little  explanation).  This  is  unfortunate  because  it  makes  all  levels  of 
detail  quite  inaccessible  to  the  practitioner  or  engineer.  /?esisfance  is  a  much 
more  palatable  concept  in  this  regard,  but  even  this  concept  may  require  care¬ 
ful  verification  for  complex  estimates.  Things  get  even  more  complicated  when 
one  tries  to  provide  an  adequate  definition  of  qualitative  robustness  for  time 
series  problems. 

On  the  other  hand,  it  is  quite  important  to  have  a  solid  theory  as  a  corner¬ 
stone  from  which  to  build.  If  the  theory  is  complex,  as  is  now  the  case,  then  the 
theoretician  has  a  responsibility  to  communicate  the  central  concepts  and 
results  as  clearly  and  simply  as  possible  to  potential  users  of  proposed  robust 
procedures. 


Parameter  Estimation 

In  recognition  of  the  need  for  a  suitable  version  of  qualitative  robustness  for 
time  series  parameter  estimates,  the  following  researchers  have  made  contribu¬ 
tions  to  the  problem:  Papantoni-Kazakos  and  Gray  (1979),  Cox  (1981),  Bustos 
(1981)  and  Boente,  Fraiman  and  Yohai  (1982). 

An  issue  arising  in  the  time  series  case  is  that  of  specifying  the  metric,  and 
hence  the  topology,  for  the  space  of  sample  paths.  There  are  a  variety  of  ways 
to  do  this,  as  is  reflected  in  the  above  references,  and  what  is  required  is  a  rea¬ 
sonable  balance  so  that  the  topology  is  neither  too  weak  (in  which  case  no  esti¬ 
mates  are  robust)  nor  too  strong  (in  which  case  all  estimates  are  robust). 


Papantoni-Kazakos  and  Gray  (1979)  work  with  the  so-called  ~p  (rho-bar) 
metric.  Their  definition  has  a  defect  in  the  arbitrariness  of  the  per-letter  metric 
Po  used  to  arrive  at  a  final  ~p  metric.  In  order  to  deal  with  arbitrarily  heavy¬ 
tailed  processes,  for  example,  it  is  necessary  to  choose  p0  bounded.  Cox’s 
(1981)  definition  circumvents  this  difficulty,  but  only  applies  to  estimates  whose 
functional  versions  (analogous  to  T(F)  in  (6))  depend  on  only  a  finite¬ 
dimensional  marginal  distribution  for  the  process. 

The  Boente,  Fraiman  and  Yohai  (1982)  work,  initiated  by  Yohai,  seems  to  be 
the  most  attractive.  A  major  feature  of  their  definition  is  that  the  metric  cf" 
they  use  for  sample  paths  of  length  n  is  extremely  natural  and  transparent: 


d 


n 

y 


inf 


:  Ivt-v'il  ^  y\ 
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(10) 


where  #\i  :  |y*  -y\  |  2  y\  is  the  number  of  coordinates  in  the  two  observed  sam¬ 
ple  paths  y  ~  (yi  ....  yn )  and  y'  =  (y'j,  .  .  .  ,  y'n)  which  differ  by  at  least  y. 
Thus  of”  is  the  smallest  y  such  that  the  fraction  of  coordinates  whose  difference 
exceeds  y  is  no  greater  than  y.  This  is  a  data-based  distance  which  allows  for 
both  rounding  up  to  an  amount  y,  and  a  fraction  y  of  gross  errors  in  a  7  neigh¬ 
borhood.  Of  course  the  final  definition  of  robustness  involves  some  additional 
structure,  and  also  letting  n  -* «. 

Consider  an  estimate  Tn  obtained  by  solving  the  estimating  equation  of 
rather  general  form 

tMv'x . y’n;Tn)  =  o  (11) 

»  =  1 


where  y\,  .  .  .  ,y'n  is  the  observed  segment  of  a  time  series.  The  essential 
requirement  needed  to  insure  robustness  is  that  the  psi-functions  ip  be  bounded 
and  continuous.  Specific  examples  are  given  in  Section  5. 
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Filtering  and  Smoothing  Problems 

In  filtering  and  smoothing  problems  we  have  as  many  estimates,  call  them 
Xt,  t  =  1,  .  .  .  ,  7i,  as  there  are  observed  data  values  y\,  ...  ,  yn ■  Thus  a  filter  or 
a  smoother  is  a  mapping  Sn  from  Rn  to  Rn.  It  is  not  clear  exactly  what  consti¬ 
tutes  an  appropriate  definition  of  qualitative  robustness  for  problems  of  this 
type.  We  surely  want  some  form  of  continuity  for  the  sequence  of  maps 
Sn  :  fi  ->  fJ-sn (u )  where  fL  is  the  measure  for  the  stationary  process  yt  and  fisn(fi) 
is  the  measure  for  i\,  ...  ,xn.  Consistency  is  not  a  possibility  in  filtering  and 
smoothing  problems,  and  evidently  equicontinuity  may  not  be  as  crucial  here. 
However,  this  remains  to  be  determined. 

At  the  very  least,  we  would  require  a  resistance  version  of  robustness  for 
the  xt,  t  =  l,  ....  n.  This  amounts  to  requiring  that  the  map  S„  defines  a 
bounded  and  continuous  functional  of  fin,  the  measure  for  y\,  .  .  .  ,yn.  Bounded¬ 
ness  insures  that  no  single  yt  can  spoil  the  ij,  and  continuity  insures  that  small 

rounding  errors  cannot  have  a  large  effect.  Thus  we  would  require  that 

C— ' 

Sn  =  5n(/in)  be  a  weakly  continuous  function  on  the  space  „  of  measures  fin 
for  yT  =  ( y\ ,  .  .  .  ,  yn).  (Compare  this  with  Huber,  1981,  Chapter  1.)  Linear  filters 
and  smoothers  lack  resistance-appropriate  bounded  and  continuous  nonlinear¬ 
ity  is  required  to  achieve  robust/resistant  filters  and  smoothers.  The 
smoother-cleaners  of  Section  8  have  this  property. 


4.  TIME  SERIES  MODELS  FOR  OUTLIERS 


In  some  previous  work  I  have  concentrated  on  the  robust  estimation  of  AR 
and  ARMA  model  parameters,  and  robust  spectral  density  estimation,  utilizing 
the  following  two  distinct  outlier  generating  models  for  observed  time  series  yt 
(see  Martin,  1981,  and  Martin  and  Thomson,  1982,  and  the  references  therein). 


The  Innovations  Outliers  (10)  Model 


xt  =  M  +  S  Mt-i  (12) 

t=o 

where  the  tt  are  i.i.d.  with  common  distribution  F  which  is  symmetric  and  possi¬ 
bly  heavy-tailed,  £  <  00  and  fj.  is  the  location  parameter  for  xt .  Then  let 

Vt  =  (12’) 

be  perfect  observations  of  the  x*  process. 


The  Additive  Outliers  (AO)  Model 


xt  =  M  +  Z  Mt-l  (13) 

1=0 

with  tt  i.i.d.  Gaussian,  <  00  and 

Vt  =  xt  +  vt  (14) 

where  P(vt=0)  =1-7  with  7  small.  The  AR  and  ARMA  models  are  special  cases 
of  the  general  linear  processes  (12)  and  (13). 
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For  the  AR  case  the  10  model  corresponds  roughly  to  a  finite  parameter 
linear  regression  model  with  heavy-tailed  error  distribution.  However,  some 
quirks  of  the  model  exist,  and  will  be  mentioned  in  the  next  section.  The  vt  in 
the  AO  model  represent  outliers,  either  in  patches  or  in  isolation,  and  in  the  AR 
case  we  have  the  analogue  of  a  linear  regression  model  with  Gaussian  residuals, 
but  with  errors  in  the  variables  (EV). 

The  AO  model  is  a  special  case  of  a  more  general  kind  of  x t  perturbation 
model 


yt  =  (1  -zt)xt  +  ztwt  (15) 

with  Zt  a  binary  series  with  P(zt  =  l)  =  y  (see  Yohai  and  Bustos,  1982).  We  shall 
also  refer  to  this  as  an  AO  model,  even  though  the  term  replacement  model 
might  equally  well  be  used. 


AR  CH  Aut  oregre  ssions 

Recently  we  have  also  been  studying  the  properties  of  the  following  type  of 
ARCH  autoregressions  and  associated  parameter  estimation  problems  (Nemec 
and  Martin,  1983).  Let 


yt  =  7  +  +  ■  ■  ■  +  <Ppyt-j>  +  (!6) 

with  rt  an  ARCH  process  as  defined  by  Engle  (1981): 

~  AT (0.A (£-*))  (17) 

where  £.t_1  is  the  past  history  of  the  e{.  The  intercept  y  accounts  for  a  non-zero 
mean  for  yt.  The  e(  are  uncorrelated,  but  not  independent.  The  functions  h 
which  we  have  concentrated  on  are  of  the  same  form  which  Engle  (1981) 
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emphasizes  in  the  regression  context: 

h(V~l)  =  a0  +  ai£i2-!  +  +  apef_p  (18) 

The  parameters  c*i  must  satisfy  certain  minimal  constraints  to  insure  wide-sense 
stationarity,  and  more  severe  constraints  to  insure  existence  of  higher  order 
moments  (see  Engle,  1981).  The  usual  Gaussian  autoregression  is  a  special  case 
of  (18)  obtained  by  a!  =  =  ap  =  0,  and  c*o  =  o2. 

The  marginal  density  for  e<  is  more  or  less  heavy-tailed,  depending  on  the 
values  of  the  Oi.  This  statement  may  be  inferred  by  checking  that  certain 
higher  order  moments  do  not  exist,  depending  on  the  values  of  the  cr,.  and  by 
empirical  checks  based  on  the  (easily)  simulated  ARCH  type  Ej.  None-the-less, 
an  open  problem  concerning  the  e<  process  itself  is  that  of  determining  an  ana¬ 
lytic  form  for  the  stationary  distribution  of  the  e*.  even  in  the  simplest  case 
where 


h{et~x)  =  ao+ajef-i  . 

ARCH  autoregressions  are  potentially  much  more  useful  than  10  autoregres¬ 
sions  mainly  because  their  sample  paths  seem  more  realistic  representations  of 
many  time  series  sample  paths  arising  in  practice. 


Regression  with  Non-GaxissianAR  Residuals 

In  Section  7  we  discuss  robust  point  estimation  of  p  in  the  following  model: 

yt  =  +  u t  (19) 

with  the  very  special  assumptions  that  the  xt  are  known  exactly,  and 


ut  =  +  +  <ppi n-p  +  tt 


(19’) 


where  e*  is  a  possibly  heavy-tailed  outlier  producing  mechanism.  The  e*  could 
be  i.i.d.,  or  an  ARCH  process.  This  setup  includes  the  special  case  of  estimating 
location  with  non-Gaussian  AR  errors.  Except  for  the  location  case  where  some 
work  has  been  done  (Portnoy,  1977;  Wegman  and  Carroll,  1977),  this  problem  has 
not  been  studied  at  all  in  the  previous  literature. 


5.  LEAST-SQUARES  AND  ROBUST  ESTIMATES  OF  AUTOREGRESSIONS 


Let's  focus  solely  on  the  autoregression  versions  of  10  and  AO  models,  and 
the  AR  ARCH  models  described  in  the  previous  section.  Discussion  of  moving 
average  models  is  omitted  here  for  the  sake  of  brevity,  A  perfectly  observed 
Gaussian  autoregression  is  regarded  as  the  nominal  model,  with  10,  AO  and  AR 
ARCH  models  particular  types  of  non-Gaussian  deviations  from  this  nominal 
model. 

Consider  the  pth-order  autoregression  version  of  the  regression  M-estimate 
(8)  for  a/u  =  0  version  of  (12)  and  (13): 

£  ZtV(yt-*tVu)  =  0  (20) 

<=p+ 1 

where  z[  -  .  .  .  ,yt~p )•  This  includes  the  least-squares  estimate  tpis  as  a 

special  case.  Now  pLS  has  a  rather  notable  property  at  finite  variance  10 
models:  its  asymptotic  covariance  matrix  depends  only  upon  p,  and  not  upon 
the  distribution  of  the  it  (Whittle,  1962;  Martin,  1982a).  This  was  cited  as  a 
robustness  property  by  Whittle. 

However,  several  points  are  in  order.  First  of  all,  unlike  py,  pis  lacks 
efficiency  robustness  at  10  models  (Martin,  1982).  Secondly  p^  is  disastrously 
non-robust  toward  AR  ARCH  models  (Nemec  and  Martin,  1983).  We  conjecture 
that  py  is  robust  toward  AR  ARCH  models,  but  this  remains  to  be  established. 
More  importantly,  neither  pis  or  py  are  robust  toward  A0  models  of  either  the 
specific  type  (14)  or  the  general  type  (15);  both  type  of  estimates  suffer  from 
severe  biases  as  well  as  inflated  variances  (Denby  and  Martin,  1979). 

Since  A0  models  are  included  in  arbitrarily  small  Prohorov  neighborhoods 
of  a  Gaussian  autoregression  (see,  for  example,  Cox,  1981)  both  pis  and  py  lack 
qualitative  robustness!  Following  the  comments  made  in  conjunction  with  (11), 


we  require  estimating  equations  whose  summands  are  bounded  and  continuous 
functions  of  the  data,  and  this  is  not  the  case  with  the  ^/-estimate  defined  by 
(20).  The  point  is  that  AO  models  give  rise  to  errors  in  the  z *  which  can  have 
quite  potent  effects. 

Three  classes  of  robust  estimates  have  been  proposed  for  this  setup:  (i) 
Bounded-Influence  Autoregression  ( BIFAR );  ( ii )  RA-Estimates;  ( Hi )  Robust  Data 
Cleaning  followed  by  Least-Squares.  The  first  class  utilizes  bounded-influence 
regression  type  estimates,  or  generalized  A/-estimates  (CM -estimates)  applied 
to  autoregressions.  The  two  main  variants  are  the  Hampel-Krasker-Welsch  ver¬ 
sion  and  the  Mallows  version  (see  Martin.  1981,  and  the  references  therein). 

The  second  class  of  estimates,  due  to  Yohai  and  Bustos  (1982),  are  obtained 
as  follows.  First,  one  computes  robust  covariances  yk  =  yk  (tp)  of  lag-fc  residuals: 

7k  -  ~  S  ?(rt,rf+t)  (21) 

n 

where  rt  =  rt(<p)  -  yt  ~  (<P\Vt-\  +  '  •  '  +  <PPyt-p)  are  the  residuals.  Then  the  yk 
are  substituted  for  the  conventional  covariance  estimates  yk,  obtained  when 
ip(rt,rt+k)  =  rt-rt+k,  in  the  usual  least-squares  equations  expressed  in  terms  of 
yk  (see  Yohai  and  Bustos,  1982,  for  details). 

Robustness  is  achieved  by  choosing  ip  to  be  a  bounded  and  continuous  func¬ 
tion  on  Rz.  One  choice  for  ■p  is  ip(u,v)  =  Tp(u)il/(v)  for  some  bounded,  continuous 
function  on  /?*.  The  essential  idea  is  that  the  estimates  yield  zero  values  for 
robust  lag-fc  correlation  estimates  of  the  residuals,  for  k  =  l,  ...  ,p  incor¬ 
porated  in  a  manner  which  results  in  high  efficiency.  Hence  the  name  RA- 
estimates  stands  for  (robust)  residual-autocorrelation-based  estimates. 

The  third  class  of  estimates  is  obtained  by  iterative  application  of  a  robust 
smoother-cleaner  to  remove  outliers,  followed  by  application  of  the  usual  least- 
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squares  estimate  (Kleiner,  Martin  and  Thomson,  1979;  Martin,  Samarov  and  Van- 
daele,  1982;  Martin  and  Thomson,  1982).  The  smoother-cleaner  has  the  property 
that  at  a  gross-outlier  position  (in  the  sense  described  in  Section  1),  the  outlier 
is  replaced  by  an  interpolate  based  on  all  the  other  cleaned  data.  An  algorithm 
for  smoother-cleaners  is  given  in  Section  8.  Robustness  is  obtained  for  this 
method  by  virtue  of  the  smoother-cleaner  being  a  bounded  and  continuous  func¬ 
tion  of  the  data. 

All  three  of  the  above  classes  of  estimates  may  be  modified  to  cover  the 
case  of  nominally  Gaussian  ARMA  models  with  varying  degrees  of  elegance,  and 
success  yet  to  be  fully  determined. 

A  careful  comparative  study  of  the  three  approaches  is  not  yet  available. 
Yohai  and  Bustos  (1982)  should  have  good  comparative  results  on  classes  (1)  and 
(2)  for  AR(l)  and  MA(l)  models  in  the  very  near  future.  Both  B1FAR  and  RA  esti¬ 
mates  are  consistent  and  highly  efficient  at  the  nominal  Gaussian  AR  model 
(Fisher  consistency),  while  being  robust  for  well  chosen  psi-functions.  They  are 
typically  asymptotically  normal  as  well,  and  have  small  biases  at  AO  models  (one 
might  well  call  this  latter  feature  bias  robustness ).  1  believe  that  the  RA- 

estimates  will  be  generally  preferred  to  BIFAR  estimates  for  at  least  two  good 
reasons  aside  from  their  efficiency  and  bias  robustness.  Assuming  the  latter  are 
on  at  least  a  roughly  even  par  with  BIFAR  estimates,  the  RA-estimates  are  (i) 
quite  natural  for  time  series  models,  and  can  be  applied  in  principle  to  models 
of  considerable  complexity,  and  (ii)  they  can  be  designed  with  just  one  efficiency 
tuning  constant  whose  values  are  relatively  easy  to  determine  (compare  this 
with  the  difficulty  involved  in  choosing  tuning  constants  for  BIFAR  estimates 
implied  by  Peters,  Samarov  and  Welsch’s  (1982)  discussion  in  the  general 
regression  context). 
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The  method  of  robust  data-cleaning,  followed  by  least  squares  in  an  itera¬ 
tive  manner,  is  a  quite  natural  and  attractive  one.  Note,  however  that  it 
requires  the  use  of  a  BIFAR  or  RA-estimate  to  provide  a  reasonably  good  starting 
point  for  iteration,  as  the  overall  procedure  is  highly  nonlinear.  It  is  even  some 
kind  of  approximation  to  a  non-Gaussian  M.L.E.  if  an  appropriate  filter-smoother 
is  used  (Martin,  1981),  and  it  fits  in  nicely  with  a  robust  prewhitening  approach 
to  spectral  density  estimation  (Kleiner,  Martin  and  Thomson,  1979;  Martin  and 
Thomson,  1982).  The  method  has  a  drawback  whose  importance  is  somewhat 
debatable,  namely  the  method  is  not  Fisher  consistent.  This  is  certainly  quite 
objectionable  from  a  theoretical  point  of  view,  and  there  unfortunately  seems  to 
be  no  easy  way  to  get  around  the  problem  other  than  through  some  form  of 
adaption.  This  we  intend  to  pursue  in  the  near  future.  On  the  other  hand  cer¬ 
tain  calculations  show  that  the  asymptotic  bias  at  the  nominal  Gaussian  model 
will  be  so  small  as  to  have  little  practical  consequence  (Martin  and  Thomson, 
1982,  Section  6). 
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6.  FULL  ADAPTION  VERSUS  ROBUSTNESS 

During  the  course  of  the  workshop  for  which  this  talk  was  prepared,  the  fol¬ 
lowing  extemporaneous  remarks  were  made. 

Some  attention  was  given  by  several  speakers  to  density  estimation  and 
score  function  approximation,  where  the  (efficient)  score  function  is 
^  =  -/'//,  /  being  a  density  for  presumably  i.i.d.  data.  Such  attention  is 
presumably  motivated  by  a  desire  to  use  blatantly  adaptive  methods.  This 
prompted  recollection  of  Stone's  (1975)  Monte  Carlo  results  presented  at  the 
end  of  his  asymptotic  treatment  of  adaptive,  asymptotically  efficient,  location 
estimates  Ji.  These  estimates  are  obtained  by  solving 


Vi  -ft 
s 


0 


(22) 


where  $n  is  an  estimate  of  'k  and  s  is  a  robust  scale  estimate.  Stone  used 
$n(r)  =  [-/'n(r)//n(r)]-  dnir)  where  fn,  f'n  are  kernel  density  estimates 
using  a  Gaussian  density  type  kernel,  and  d„( r )  truncates  [-/  "n{r)/fn  (r )]  to 
zero  outside  a  symmetric  interval  [-a^.a,,]  with  a,,  -*  °°  as  n  -» 

A  question  frequently  raised  about  such  fully  adaptive  estimates  is,  "How 
large  must  n  be  in  order  for  the  asymptotics  to  set  in?"  Somewhat  surprisingly, 
n  needn’t  be  so  large,  as  Stone’s  Monte  Carlo  for  sample  size  n  =40  showed.  His 
results  give  EFF{fi,f )  Z  0.89  for  /  ranging  over  the  Gaussian,  Laplace,  Contam¬ 
inated  Normal  (contamination  fraction  =  0.1,  contamination  variance  =  9)  and 
Cauchy  distributions. 

While  Stone's  Monte  Carlo  results  are  quite  encouraging,  his  results  need  to 
be  contrasted  with  the  fact  that:  (i)  comparable  results  are  achieved  with  a 
robust  location  A/-estimate  of  the  type  (3)  using  a  good  s,  an  appropriate  value 
for  c,  and  a  good  redescending  psi-function  for  example  Tukey’s  bisquare 
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psi-function  (see  Mosteller  and  Tukey,  1977);  and  (ii)  such  an  .W -estimate  is  com¬ 
putationally  much  simpler  than  the  fully  adaptive  estimate  (22). 
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7.  ROBUSTNESS  AND  DEPENDENCY 

In  this  section  we  wish  to  make  two  main  points.  The  first  is  that  relatively 
small  amounts  of  serial  correlation  can  seriously  affect  the  level  (or  false  alarm) 
of  a  test,  or  equivalently  the  error  rate  of  a  confidence  interval.  This  is  true 
even  in  the  completely  Gaussian  case,  where  it  is  a  surprisingly  unadvertised 
fact  that  tests  and  confidence  intervals  are  very  non-robust  toward  dependency. 
Here  we  use  the  word  robust  very  loosely  and  intuitively—the  definitions  of  quali¬ 
tative  robustness  for  time  series  given  in  Section  3  may  need  to  be  modified  for 
this  kind  of  problem. 

The  second  point  is  made  in  connection  with  the  very  special  model 
assumptions  made  in  equations  (19)~{19').  Namely,  ordinary  location  M- 
estimates  are  not  adequate  for  estimation  of  location  with  non-Gaussian  autore¬ 
gressive  errors,  unless  the  dependency  is  quite  weak.  They  can  be  quite 
inefficient  compared  with  proper  M-estimates,  i.e.,  true  M.L.E.  type  estimates  for 
the  actual  model.  Similar  comments  apply  to  problems  of  linear  regression  with 
non-Gaussian  autoregressive  errors. 


The  Student's  t  Confidence  Interval  with  Dependency 

Consider  the  usual  Student's  t  95%  confidence  interval  which  has  error  rate 
of  5%:  Cl  =  (y-t  azs.n-iS/'/n,  y~  + 1  oas.n-i  S/VrT),  where  y~  is  the  sample 
mean  of  y\.y&  .  .  .  ,  yn.  and  S2  is  the  usual  sample  variance  estimate.  Suppose 
that  in  fact  the  yt  are  given  by  the  special  case  of  (19)-(19')  where  «/)?=  p,  a 
location  parameter,  and  that  ut  in  (19’)  is  a  zero  mean  Gaussian  AR(1)  process 
with  transition  parameter  <p-  If  in  fact  ;p  =  0,  then  Cl  has  the  stated  error  rate  of 
0.05.  However  when  and  the  sample  size  is  large,  the  results  are  as  follows: 
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<p  Error  Rate 
025  013 

05  0.27 

07  0.42 

0.9  0.66 

The  results  are  dramatic.  For  p  =  0.25  the  error  rate  has  more  than  dou¬ 
bled.  and  things  get  rapidly  worse  with  increasing  <p.  The  problem  is  that  as 
n  -*  °° 

2  2 

S2  -  VARi/i  =  ~r~~2  /  VAR-v^Ty  =  =  Su(0)  (23) 

i-p  (i -<pr 

where  VAR»  denotes  the  asymptotic  variance,  and  Su(f )  is  the  spectral  density 
for  the  error  process  %Lt .  It  should  be  noted  that  the  right  hand  equalities  hold 
quite  generally;  we  needn't  restrict  ourselves  to  A R  or  even  ARMA  processes 
(Grenander,  1981).  What  we  need  to  do  to  studentize  y  with  dependency  present 
is  an  estimate  of  Su(0),  the  spectral  density  of  the  error  process  at  the  origin. 
The  same  is  true  with  regard  to  setting  the  threshold  for  tests. 

Heidelberger  and  Welch  (1980)  have  studied  nonparametric  methods  for 
doing  this.  The  author  and  a  student  have  checked  the  behavior  of  autoregres¬ 
sive  type  estimates  of  Su( 0)  with  Akaike’s  (1977)  order  selection  rule  AIC,  in  a 
casual  way  via  Monte  Carlo.  This  also  seems  to  work  with  the  proviso  that 
jackknifing  must  be  done  to  remove  the  0(7i“l)  bias  in  the  autoregressive 
coefficient  estimates  if  the  sample  size  is  not  large  enough  relative  to  the 
amount  of  correlation  (this  remains  to  be  determined  with  care,  but  for  an  AR(1) 
process,  ^  =  0.8  and  n  =50  definitely  requires  such  bias  removal). 
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Robust  Estimation  of  Location 

P.  Huber’s  (1964)  Af -estimates  pen  of  location,  obtained  by  solving  (3).  were 
introduced  in  the  context  of  independent  and  identically  distributed  observa¬ 
tions  yt .  The  new  subscript  notation  "OM"  stands  for  ordinary  location  M- 
estimate,  for  reasons  which  will  become  obvious  shortly.  The  behavior  of  pou 
when  the  yt  are  both  dependent  and  non-Gaussian  has  received  relatively  little 
attention.  However,  some  relatively  recent  work  includes  that  of  Portnoy  (1977) 
and  Wegman  and  Carroll  (1977).  The  main  conclusions  of  Portnoy's  work  are:  (i) 
if  the  yt  have  only  weak  correlation  structure  then  Hom  has  high  absolute 
efficiency  for  heavy-tailed  distributions  associated  with  moving-average  type 
errors:  (ii)  weak  dependency  and  heavy-tailedness  seems  to  motivate  the  use  of 
redescending  psi-function. 

Unfortunately,  ordinary  location  M -estimates  cannot  compete  with  proper 
location  M -estimates  with  non-Gaussian  ARMA  model  errors  when  the  correlation 
structure  is  moderate  to  strong.  By  proper  Af -estimate  we  mean  true 
maximum-likelihood  type  estimates  appropriate  for  the  model.  These  are 
obtained  as  follows. 

Let  yt  by  given  by  the  location  model  special  case  of  (19) 

yt  =  p.  +  ut  (24) 

where  the  itj  are  now  an  ARMA  (p,q )  generalization  of  (19’)  process 

ut  +  +  •  •  •  +  =  tt  +  0iet_i  +  •  +  eqct-q  (24’) 

Heavy-tailed  F's  give  rise  to  outliers  in  the  et,  and  hence  in  the  ut  and  yt.  This 
model  may  be  written  in  the  equivalent  form 

Vt  +  ViVt-\  +  +  VpVt-p  =  7  +  et  +  ffict-,  +  •  •  •  +  0,ct-q  (25) 
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where  the  expression  for  the  intercept  7  is 

7  =  p(  1  +  £?i)  (26) 

Let  a  =  (y.p.O)  denote  the  true  parameter  vector  for  (24)-(24’)  or  (25)-(26), 
and  let  a'  denote  an  arbitrary  value  in  the  region  where  the  process  yt  is  sta¬ 
tionary  and  invertible.  For  a  given  a’  one  can  generate  residuals  rt(a')  from  the 
recursion,  using  appropriate  initial  conditions,  in  the  usual  way  (see  for  exam¬ 
ple,  Box  and  Jenkins,  1976).  An  Af  -estimate  5  of  a  is  a  solution  of  the  minimiza¬ 
tion  problem 


A 

rt(a') 

mina.  P 

(=t 

cs 

For  p(f )  =  -  log  /  (f ),  this  yields  a  conditional  maximum  likelihood  estimate 
(conditioned  on  yy . yv  and  the  initial  conditions  for  the  zt).  which  is  asymp¬ 

totically  efficient.  Consistency  and  asymptotic  normality  of  "one-step”  Af- 
estimates  are  established  in  Lee  and  Martin  (1983). 

Now  given  the  U -estimates  a  =  (y.$,0),  the  relation  (26)  leads  to  the  proper 
location  Af -estimate 


M  = 


__i _ 

l  +  Efo 


(28) 


In  the  special  case  where  p(f )  =  -  log  /(f)  this  yields  the  conditional  M.L.E.  of  p. 
The  above  estimate  is  the  one  which  is  really  the  appropriate  Af -estimate  of  p 
for  the  model  (24)-(24'). 

Detailed  comparisons  of  the  asymptotic  and  finite  sample  behaviors  of  pox 
and  py  are  given  for  AR(1)  and  MA(1)  models  by  Lee  and  Martin  (1983).  It  is 
shown  that  the  efficiency  of  poy  can  be  quite  small  relative  to  that  of  p. 


Robvst  Estimation  of  Signal  Parameters 

The  regression  model  (19)  contains  as  special  cases  some  of  the  classical 
models  of  communication  theory,  where  one  is  estimating  signal  parameters. 
For  example,  estimation  of  signal  amplitude  deals  with  the  case 
xf  =  /S  cos  2nf0t ,  while  estimation  of  signal  amplitude  and  phase  is  based  on  the 
case  where  =  fli  cos  2tt f0t  +  fa  sin  2nfot.  For  these  models  it  turns  out  that 
the  ordinary  least-squares  estimates  are  asymptotically  efficient  when  the  e t  in 
(19')  are  Gaussian,  and  even  under  much  more  general  assumptions  for  Gaussian 
Ut  (Grenander  and  Rosenblatt,  1957;  Grenander,  1981). 

However,  when  the  et  are  non-Gaussian  and  heavy-tailed,  the  situation  is 
much  the  same  as  in  the  location  problem  just  discussed.  An  alternative  to  least 
squares  is  required,  but  ordinary  M-estimates  lack  efficiency  robustness.  One 
requires  a  proper  M-estimate  geared  to  the  model  (19)-(19’)t  and  such  estimates 
are  unfortunately  a  bit  more  complicated  than  in  the  simple  case  of  estimating 
location.  One  possibility  for  computing  proper  M-estimates  for  regression 
models  with  non-Gaussian  AH  errors  is  via  a  straightforward  robustification  of 
Durbin's  (1960)  two-stage  least-squares  procedure.  Details  may  be  found  in  Mar¬ 
tin  (1982b). 
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8.  ROBUST  DATA  SMOOTHER  CLEANERS 

As  was  mentioned  in  Section  5,  so-called  smoother-cleaners  form  a  building 
block  for  robust  parameter  estimation.  They  also  form  a  basis  for  robust  spec¬ 
tral  estimation  via  a  robust  prewhitening  approach.  Since  details  are  provided 
in  the  references  cited  in  Section  5,  only  the  briefest  of  descriptions  and  an 
example  are  provided  here. 

Consider  the  AO  model  (14).  with  xt  and  AR(p)  process  having  a  state- 
variable  representation  =  <WQ_j  +  Ut,  with  xt  =  (X<)t  being  the  first  component 
of  the  p-vector  Xt,  and  similarly  tt  =  (Ut)j.  In  the  first  pass  the  data  yt  is  pro¬ 
cessed  in  forward  time  with  the  filter-cleaner  algorithm 

Xt  =  *Xt_!  +  mtst^  — y--~-  (29) 

St 

where 

yi~X  =  (*Xt-,)i  (30) 

is  a  robust  one-step-ahead  predictor,  as  was  mentioned  in  Section  1;  here  is 
bounded  and  continuous,  and  the  "gain"  mt  and  the  time-varying  scale  st  are 
computed  from  auxiliary  recursions.  In  essence  (29)-(30)  is  a  robustified  Kal¬ 
man  filter  with  data-dependent  gain  and  scale  sequences. 

The  smoother-cleaner  output  is  then  obtained  by  the  reverse-time  pass 

XT  =  %  +A,(Xtn+1-$Xt)  ,  t  =n-l,n-2,  ..  .  .  1  (31) 

with  initial  condition  XJ  =  %l.  Here  the  %  come  from  (29),  and  the  At  are  com¬ 
puted  from  quantities  appearing  in  the  auxiliary  recursions  for  (29).  This  algo¬ 
rithm  is  a  robustified  form  of  the  optimal  linear  smoothing  algorithm  due  to 
Meditch  (1969). 
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A a  an  example  of  the  efficacy  obtainable  through  use  of  the  smoother- 
cleaner  (29)-(30),  consider  the  glint  noise  sample  path  in  Figure  1.  This  highly 
spikey  non-Gaussian  data  is  obtained  from  radar  measurements  of  position  of  an 
aircraft  target.  The  composite,  reverberation-like  nature  of  the  radar  return  is 
the  cause  of  the  glint  spikes,  which  result  in  an  unnecessarily  high  observation- 
noise  variance  at  the  input  of  a  target  tracking  loop.  These  spikes  can  be  nicely 
eliminated,  and  the  observation  noise  level  thereby  tremendously  reduced,  by 
use  of  a  smoother-cleaner,  as  shown  in  Figure  2,  where  a  3rd-order  autoregres¬ 
sive  approximation  for  the  data  was  used.  For  details  concerning  the  application 

of  smoother-cleaners  to  glint  noise  data,  see  Section  VII  of  Martin  and  Thomson 
(1982). 
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